China TCM News Article Scraper
Pricing
Pay per event
China TCM News Article Scraper
Scrapes articles from 中国中医药网 (cntcm.com.cn) — the official journal of China's National Administration of Traditional Chinese Medicine. Extracts title, author, publish date, source edition, full article body, and metadata tags (TCM topics, related herbs, integrative keywords) for each article.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Share
Scrape full-text articles from 中国中医药网 (cntcm.com.cn) — the official journal of China's National Administration of Traditional Chinese Medicine. Extracts article title, author, publish date, source edition, and full article body (both plain text and HTML), plus keyword tags for TCM topics, integrative medicine crossovers, and related Chinese herbs.
What you get
Each scraped article includes:
- article_id — unique identifier from the URL
- title — article headline in Chinese
- category — mapped to:
policy_regulation,clinical_research,herb_pharmacology,traditional_practice,news_industry,yangsheng_wellness - publish_date — as printed on the page (YYYY-MM-DD)
- source — newspaper edition or column (e.g., 中国中医药报7版)
- author — byline
- body_text — full article body as plain text
- body_html — full article body as raw HTML
- tcm_topics — detected TCM keywords (针灸, 中药, 养生, etc.)
- integrative_keywords — crossover wellness terms (yoga, qigong, 瑜伽, etc.)
- related_herbs — Chinese herb names cited in the article
- source_url — canonical article URL
- scraped_at — ISO timestamp
Use cases
- Clinical research: Track TCM policy announcements, trial reports, and regulatory updates from China's primary source
- Pharma regulatory monitoring: Monitor CN herbal medicine policy and approval news
- Academic research: Sinology, comparative medicine, integrative health studies
- LLM training corpora: High-quality Chinese medical text from an authoritative institutional source
- Integrative medicine: TCM-yoga/qigong/meditation crossover content discovery
Inputs
| Field | Type | Description | Default |
|---|---|---|---|
maxItems | Integer | Maximum articles to scrape (0 = no limit) | 10 |
startDate | String | Only articles published on or after this date (YYYY-MM-DD) | — |
Notes
- Discovery uses the site's comprehensive
/sitemap.txt(~46,000+ article URLs) - Server-rendered HTML — no JavaScript execution required
- Polite crawl with modest concurrency (5 concurrent requests)
- Robots.txt is respected — a small number of sensitive articles listed in robots.txt Disallow are not included in the sitemap